Skip to content

Conversation

@shijin-aws
Copy link
Contributor

No description provided.

| **Secondary Caps** |efa|efa-direct|
| ------------------ |:-:|:--------:|
| `FI_FENCE` |❌|❌ |
| `FI_MULTI_RECV` |❌ |✓ |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FI_MULTI_RECV should be the other way around

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, forgot to update.

This doc compares the features and implementations
between efa and efa-direct fabrics

Signed-off-by: Shi Jin <[email protected]>
@shijin-aws
Copy link
Contributor Author

bot:aws:retest

@shijin-aws shijin-aws requested a review from amitrad-aws January 6, 2026 18:47

The **`efa` fabric** implements a comprehensive set of [wire protocols](efa_rdm_protocol_v4.md) that include emulations to support capabilities beyond what the EFA device natively provides. This allows broader libfabric feature support and application compatibility, but results in a more complex code path with additional protocol overhead.

The **`efa-direct` fabric** offers a more direct approach that mostly exposes only what the EFA NIC hardware natively supports. This results in a more compact and efficient code path with reduced protocol overhead, but requires applications to work within the constraints of the hardware capabilities.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why "mostly"?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We expose counters which are not natively supported at the moment

**Tx Post:**
- Constructs Work Queue Entry (WQE) directly from application calls (`fi_*` functions)
- Maintains 1-to-1 mapping between WQE and libfabric call
- Only performs two operations before data is sent over wire:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Send to the NIC, not sent over wire

**Tx Post:**
- Allocates internal data structure called `efa_rdm_ope` (EFA-RDM operational entry)
- Maintains 1-to-1 mapping between `efa_rdm_ope` and libfabric call (`fi_*` functions)
- Chooses appropriate protocol based on operation type and message size

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And EFA NIC capabilities

- Each `pke` corresponds to a WQE that interacts with EFA device
- One operation entry can map to multiple packet entries (e.g., a 16KB message can be sent via 2 packet entries)
- **Note**: For RMA operations (`fi_read`/`fi_write`), such workflow still applies, but when device RDMA is available, the data goes directly to/from user buffers without internal staging or copying. Since efa fabric supports unlimited
size for RMA, when the libfabric message is larger than the max rdma size of the device, it consume use multiple packet entries.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"it consume use multiple packet entries", did you mean it consumes and uses?


- Both support FI_EP_RDM for reliable datagram.

- FI_EP_DGRAM is only supported by efa fabric. Though it uses the same code path as efa-direct, it is kept in efa fabric for backward compatibility.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't explain why it is not supported in efa-direct. The reason is there is no large overhead in "efa" fabric with this, right?




| **Modes** |efa|efa-direct|

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is the efa column empty?

| **Other Libfabric Features** |efa|efa-direct|
| ---------------------------- |:-:|:--------:|
| FI_RM_ENABLED |\*|\* |
| fi_counter |✓ |✓ |

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we support counters in efa-direct?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes

EFA->>Device: Poll device CQ
Device->>EFA: Device completion (pke)
EFA->>EFA: Find ope from pke
EFA->>Queue: Search Rx queue for match

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You only addressed RX CQEs here, maybe we should add TX ones as well (each in optional)


**Rx Post:**
- Pre-posts internal Rx buffers to device for incoming data from peers
- User buffers from `fi_recv` calls are queued in internal libfabric queue (not posted to device)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They can be if we emulate the send/recv with RDMA read

- Polls device CQ for completion of packet entries posted to EFA device
- Finds corresponding operation entries stored in packet entry structures
- Uses counters and metadata in operation entry to track completion progress
- Generates libfabric completion when operation entry has all required data
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We also copy and stage the CQEs - which is another difference between efa and efa-direct


***

| **Endpoint Types** |efa|efa-direct|
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you like to add these here? Because it would be a duplicate of https://github.com/ofiwg/libfabric/wiki/Provider-Feature-Matrix and we would need to maintain it in both places

participant Provider as EFA Provider
participant Application

Note over DeviceCQ,Application: Optimized Completion Path
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add lines here

Application->>Provider: Poll for completion
Provider->>DeviceCQ: Poll for completion


The **`efa` fabric** implements a comprehensive set of [wire protocols](efa_rdm_protocol_v4.md) that include emulations to support capabilities beyond what the EFA device natively provides. This allows broader libfabric feature support and application compatibility, but results in a more complex code path with additional protocol overhead.

The **`efa-direct` fabric** offers a more direct approach that mostly exposes only what the EFA NIC hardware natively supports. This results in a more compact and efficient code path with reduced protocol overhead, but requires applications to work within the constraints of the hardware capabilities.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We expose counters which are not natively supported at the moment

| **Other Libfabric Features** |efa|efa-direct|
| ---------------------------- |:-:|:--------:|
| FI_RM_ENABLED |\*|\* |
| fi_counter |✓ |✓ |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants